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Abstract 

Local  learning  methods,  such  as  local  linear  regression  and  nearest  neighbor  classifiers,  base  estimates  on  nearby 
training  samples,  neighbors.  Usually  the  number  of  neighbors  used  in  estimation  is  fixed  to  be  a  global  “optimal” 
value,  chosen  by  cross-validation.  This  paper  proposes  adapting  the  number  of  neighbors  used  for  estimation  to 
the  local  geometry  of  the  data,  without  need  for  cross-validation.  The  term  enclosing  neighborhood  is  introduced 
to  describe  a  set  of  neighbors  whose  convex  hull  contains  the  test  point  when  possible.  It  is  proven  that  enclosing 
neighborhoods  yield  bounded  estimation  variance  under  some  assumptions.  Three  such  enclosing  neighborhood 
definitions  are  presented:  natural  neighbors ,  natural  neighbors  inclusive,  and  enclosing  k-NN.  The  effectiveness 
of  these  neighborhood  definitions  with  local  linear  regression  is  tested  for  estimating  look-up  tables  for  color 
management.  Significant  improvements  in  error  metrics  are  shown,  indicating  that  enclosing  neighborhoods  may  be 
a  promising  adaptive  neighborhood  definition  for  other  local  learning  tasks  as  well,  depending  on  the  density  of 
training  samples. 
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LOCAL  learning,  which  includes  nearest  neighbor  classifiers,  linear  interpolation,  and  local  linear  regression, 
has  been  shown  to  be  an  effective  approach  for  many  learning  tasks  [1] — [5],  including  color  management 
[6].  Rather  than  fitting  a  complicated  model  to  the  entire  set  of  observations,  local  learning  fits  a  simple  model 
to  only  a  small  subset  of  observations  in  a  neighborhood  local  to  each  test  point.  An  open  issue  in  local  learning 
is  how  to  define  an  appropriate  neighborhood  to  use  for  each  test  point.  In  this  paper  we  consider  neighborhoods 
for  local  linear-  regression  that  automatically  adapt  to  the  geometry  of  the  data,  thus  requiring  no  cross-validation. 
The  neighborhoods  investigated,  which  we  term  enclosing  neighborhoods,  enclose  a  test  point  in  the  convex  hull 
of  the  neighborhood  when  possible.  We  prove  that  if  a  test  point  is  in  the  convex  hull  of  the  neighborhood,  then 
the  variance  of  the  local  linear-  regression  estimate  is  bounded  by  the  variance  of  the  measurement  noise. 

We  apply  our  proposed  adaptive  local  linear  regression  to  printer  color  management.  Color  management  refers 
to  the  task  of  controlling  color  reproduction  across  devices.  Many  commercial  industries  require  accurate  color,  for 
example  for  the  production  of  catalogs  and  the  reproduction  of  artwork.  In  addition,  the  rising  ubiquity  of  cheap 
color  printers  and  the  growing  sources  of  digital  images  has  recently  led  to  increased  consumer  demand  for  accurate 
color  reproduction. 

Given  a  CIELAB  color  one  would  like  to  reproduce,  the  color  management  problem  is  to  determine  what  RGB 
color  one  must  send  the  printer  to  minimize  the  error  between  the  desired  CIELAB  color  and  the  CIELAB  color 
that  is  actually  printed.  When  applied  to  printers,  color  management  poses  a  particularly  challenging  problem. 
The  output  of  a  printer  is  a  nonlinear  function  that  depends  on  a  variety  of  non-trivial  factors,  including  printer 
hardware,  the  halftoning  method,  the  ink  or  toner,  paper  type,  humidity,  and  temperature  [6]-[8].  We  take  the 
empirical  characterization  approach:  regression  on  sample  printed  color  patches  that  characterize  the  printer. 

Other  researchers  have  shown  that  local  linear-  regression  is  a  useful  regression  method  for  printer  color  manage¬ 
ment,  producing  the  smallest  Af|4  reproduction  errors  when  compared  to  other  regression  techniques,  including 
neural  nets,  polynomial  regression,  and  tetrahedral  inversion  [6,  Section  5.10.5.1].  In  that  previous  work,  the  local 
lineal-  regression  was  performed  over  neighborhoods  of  k  =  15  nearest-neighbors,  a  heuristic  known  to  produce 
good  results  [9], 


This  paper  begins  with  a  review  of  local  linear  regression  in  Section  I.  Then  neighborhoods  for  local  learning  are 
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discussed  in  Section  II,  including  our  proposed  adaptive  neighborhood  definitions.  The  color  management  problem 
and  experimental  setup  are  discussed  in  Section  III  and  results  are  presented  in  Section  IV.  We  consider  the  size 
of  the  different  neighborhoods  in  Section  V,  both  theoretically  and  experimentally.  The  paper  concludes  with  a 
discussion  about  neighborhood  definitions  for  learning. 


I.  Local  Linear  Regression 

Linear  regression  is  widely  used  in  statistical  estimation.  The  benefits  of  a  linear  model  arc  its  simplicity  and 
ease  of  use,  while  its  major  drawback  is  its  high  model  bias:  if  the  underlying  function  is  not  well  approximated  by 
an  affine  function,  then  linear  regression  produces  poor  results.  Local  linear  regression  exploits  the  fact  that,  over  a 
small  enough  subset  of  the  domain,  any  sufficiently  nice  function  can  be  well-approximated  by  an  affine  function. 

Suppose  that  for  an  unknown  function  /  :  Rd  — >  R,  we  are  given  a  set  of  inputs  X  =  {x\, . . .  ,xn},  where 
Xi  £  Rd  and  outputs  y  =  { y i , . . . .  y;\- },  where  yt  £  R.  The  goal  is  to  estimate  the  output  y  for  an  arbitrary  test 
point  g  £  Rd.  To  form  this  estimate,  local  linear  regression  fits  the  least-squares  hyperplane  to  a  local  neighborhood 
Jg  of  the  test  point,  y  =  f3T g  +  A),  where 

(p,  Poj  =  arg  min  ^  (y*  -  PTXj  -  /30)2  .  (1) 

(P,Po)  x-cj 

^3  '3‘-'  S' 

The  number  of  neighbors  in  Jg  plays  a  significant  role  in  the  estimation  result.  Neighborhoods  that  include 
too  many  training  points  can  result  in  regressions  that  oversmooth.  Conversely,  neighborhoods  with  too  few  points 
can  result  in  regressions  with  incorrectly  steep  extrapolations.  One  approach  to  reducing  the  estimation  variance 
incurred  by  small  neighborhoods  is  to  regularize  the  regression,  for  example  by  using  ridge  regression  [5],  [10]. 
Ridge  regression  forms  a  hyperplane  fit  as  in  equation  (1),  but  the  coefficients  (3  instead  minimize  a  penalized 
least-squares  criteria  that  discourages  tits  with  steep  slopes.  Explicitly, 


=  arg  min 

W,Po) 


£ 

Xj^L^Jg 


(: yi  -  PTXj  -  A))2  +  A/3tA 


(2) 


where  the  parameter  A  controls  the  trade-off  between  minimizing  the  error  and  penalizing  the  magnitude  of  the 


coefficients.  Larger  A  results  in  lower  estimation  variance,  but  higher  estimation  bias.  Although  we  found  no 
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literature  using  regularized  local  linear  regression  for  color  management,  its  success  for  other  applications  motivated 
its  inclusion  in  our  experiments. 

II.  Enclosing  Neighborhoods 

For  any  local  learning  problem,  the  user  must  define  what  is  to  be  considered  local  to  a  test  point.  Two  standard 
methods  each  specify  a  fixed  constant:  either  in  the  form  of  the  number  of  neighbors  k,  or  the  bandwidth  of  a 
symmetric  distance-decaying  kernel.  For  kernels  such  as  the  Gaussian,  the  term  “neighborhood”  is  not  quite  as 
appropriate,  since  all  training  samples  receive  some  weight.  However  a  smaller  bandwidth  does  correspond  to  a 
more  compact  weighting  of  nearby  training  samples.  Commonly,  the  neighborhood  size  k  or  the  kernel  bandwidth  is 
chosen  by  cross-validation  over  training  samples  [5],  For  many  applications,  including  the  printer  color  management 
problem  considered  in  this  paper,  cross-validation  can  be  impractical.  Consider  that  even  if  some  data  were  set 
aside  for  cross-validation,  patches  would  have  to  be  printed  and  measured  for  each  possible  value  of  k.  This  makes 
cross-validation  over  more  than  a  few  specific  values  of  k  highly  impractical.  Instead,  it  will  be  useful  to  define  a 
neighborhood  that  locally  adapts  to  the  data,  without  need  for  cross-validation. 

Prior  work  in  adaptive  neighborhoods  for  /c-NN  has  largely  focused  on  locally  adjusting  the  distance  me  tide 
[  1 1]— [20].  The  rationale  behind  these  adaptive  metrics  is  that  many  feature  spaces  are  not  isotropic  and  the 
discriminability  provided  by  each  feature  dimension  is  not  constant  throughout  the  space.  However,  we  do  not 
consider  such  adaptive  metric  techniques  appropriate  for  the  color  management  problem  because  the  feature  space 
is  the  CIEFAB  colorspace,  which  was  painstakingly  designed  to  be  approximately  perceptually  uniform  with  three 
feature  dimensions  that  arc  approximately  perceptually  orthogonal. 

Other  approaches  to  defining  neighborhoods  have  been  based  on  relationships  between  training  points.  In  the 
symmetric  k  nearest  neighbor  rule,  a  neighborhood  is  defined  by  the  test  sample’s  k  nearest  neighbors  plus  those 
training  samples  for  which  the  test  sample  is  a  k  nearest  neighbor  [21],  Zhang  et  al.  [22]  called  for  an  “intelligent 
selection  of  instances”  for  local  regression.  They  proposed  a  method  called  k-surrounding  neighbor  (k-SN)  with 
the  ideal  of  selecting  a  pre-set  number  k  of  training  points  that  arc  close  to  the  test  point,  but  that  are  also  “well- 
distributed”  around  the  test  point.  Their  k-SN  algorithm  selects  nearest  neighbors  in  pairs:  first  the  nearest  neighbor 


z  not  yet  in  the  neighborhood  is  selected,  then  the  next  nearest  neighbor  that  is  farther  from  z  than  it  is  from  the 
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test  point  is  added  to  the  neighborhood.  Although  this  technique  locally  adapts  to  the  spatial  distribution  of  the 
training  samples,  it  does  not  offer  a  method  for  adaptively  choosing  the  neighborhood  size  k.  Another  spatially-based 
approach  uses  the  Gabriel  neighbors  of  the  test  point  as  the  neighborhood,  [23],  [24,  pg.  90]. 

We  present  three  neighborhood  definitions  that  automatically  specify  Jg  based  on  the  geometry  of  the  training 
samples,  and  show  how  these  neighborhoods  provide  a  robust  estimate  in  the  presence  of  noise.  Because  each  of 
the  three  neighborhood  definitions  attempt  to  “enclose”  the  test  point  in  the  convex  hull  of  the  neighborhood,  we 
introduce  the  term  enclosing  neighborhood  to  refer  to  such  neighborhoods.  Given  a  set  of  training  points  X  and 
test  point  g,  a  neighborhood  Jg  is  an  enclosing  neighborhood  if  and  only  if  g  €  conv(Jg)  when  g  £  conv(A). 
Here,  the  convex  hull  of  a  set  S  =  {si, . . . ,  sn}  with  n  elements  is  defined  as  conv(,S')  =  {^['=|  ,wlsl  \  Wi  > 
0,  YH=iwi  =  !}•  The  intuition  behind  regression  on  an  enclosing  neighborhood  is  that  interpolation  provides  a 
more  robust  estimate  than  extrapolation.  This  intuition  is  formalized  in  the  following  theorem. 

Theorem  1:  Consider  a  test  point  g  £  Rd  and  a  neighborhood  Jg  ofn  training  points  {x\, . . . ,  xn }  where  Xi  £  Rd. 
Suppose  g  and  each  Xj  are  drawn  independently  and  identically  from  a  sufficiently  nice  distribution,  such  that  all 
points  are  in  general  position  with  probability  one.  Let  f(g)  =  aTg  +  ao  +  <A)  an-d  f(%j)  =  aTXj  +  ao  +  Wj, 
where  a  £  Rd,  ao  £  R,  and  let  each  component  of  the  additive  noise  vector  lu  =  (cuo, . . . ,  ton)T  £  R(n+1)  be 
independent  and  identically  distributed  according  to  a  distribution  with  finite  mean  and  finite  variance  a2.  Given 
the  measurements  {f(x i), . . . ,  f(xn)},  consider  the  linear  estimate  f(g)  =  (3T g  +  / % ,  where  ($,Po)  solve 

(P,Po)  =  arg min  Y"  ~  f3TXj  ~  Po)2  ■  (3) 

Then  if  g  £  conv(ffg),  the  estimation  variance  is  bounded  by  a2'. 


The  proof  is  given  in  Appendix  A.  Note  that  if  g  0  conv(A’)  for  training  set  X,  then  the  enclosing  neighborhood 
Jg  cannot  satisfy  g  £  conv(l7s),  and  there  is  no  bound  on  the  estimation  variance.  In  the  limit  of  the  number  of 
training  samples  n  —>  oo,  P{g  £  conv(A)}  =  1  [3,  Theorem  3,  p.  776].  However,  the  curse  of  dimensionality 
dictates  that  for  a  training  set  X  with  finite  n  elements  and  a  test  point  g  drawn  iid  in  R'/  dimensions,  the  probability 
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that  g  E  conv(A)  decreases  as  d  increases  [5],  [25].  This  suggests  that  enclosing  neighborhoods  are  best-suited 
for  regression  when  the  number  of  training  samples  n  is  high  relative  to  the  number  of  feature  dimensions  d,  such 
as  in  the  color  management  problem. 

Next,  we  describe  three  examples  of  enclosing  neighborhoods. 


A.  Natural  Neighbors 

Natural  neighbors  are  an  example  of  an  enclosing  neighborhood  [26],  [27].  The  natural  neighbors  arc  defined  by 
the  Voronoi  tessellation  V  of  the  training  set  and  test  point  {X,  g).  Given  V,  the  natural  neighbors  of  g  are  defined 
to  be  those  training  points  Xj  whose  Voronoi  cells  are  adjacent  to  the  cell  containing  g.  An  example  of  the  natural 
neighbors  is  shown  in  the  left  diagram  of  Figure  1 . 

The  local  coordinates  property  of  the  natural  neighbors  [26]  can  be  used  to  prove  that  the  natural  neighbors 
form  an  enclosing  neighborhood  when  g  E  conv(A).  Though  commonly  used  for  3D  interpolation  with  a  specific 
generalized  linear  interpolation  formula  called  natural  neighbors  interpolation.  Theorem  1  suggests  that  natural 
neighbors  may  be  useful  for  local  linear-  regression  as  well.  We  were  unable  to  find  examples  where  the  natural 
neighbors  was  used  as  a  neighborhood  definition  for  local  regression  or  nearest  neighbor  classification.  One  issue 
with  natural  neighbors  for  general  learning  tasks  is  the  complexity  of  computing  the  Voronoi  tessellation  of  n  points 
in  d  dimensions  is  0(n log  n)  when  d  <  3  and  0((n/d)d /2)  when  d  >  3  [28]. 


B.  Natural  Neighbors  Inclusive 


The  natural  neighbors  may  include  a  far-  training  sample  xt  but  exclude  a  nearer  sample  xj.  We  propose  a  variant, 
natural  neighbors  inclusive,  which  consists  of  the  natural  neighbors  and  all  training  points  within  the  distance  to 
the  furthest  natural  neighbor,  maxx.ejs  \\g  —  Xj || 2-  That  is,  given  the  set  of  natural  neighbors  Jg,  the  inclusive 
natural  neighbors  of  g  are 


Ixj  E  X 


\\g  -  xj ||2  <  max  || g  -  Xi ||2 

XiGjg 


(4) 


This  is  equivalent  to  choosing  the  smallest  k- NN  neighborhood  that  includes  the  natural  neighbors. 


An  example  of  the  natural  neighbors  inclusive  neighborhood  is  shown  in  the  middle  diagram  of  Figure  1. 
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C.  Enclosing  k-NN  Neighborhood 

The  lineal-  model  in  local  linear  regression  may  oversmooth  if  the  neighbors  are  far  from  the  test  point  g.  To  reduce 
this  risk,  we  propose  the  neighborhood  of  the  k  nearest  neighbors  with  the  smallest  k  such  that  g  G  co\w(  Jg(k)), 
where  Jg{k )  denotes  the  k  nearest  neighbors  of  g.  Note  that  no  such  k  exists  if  g  0  conv(T’).  Therefore,  it  is 
helpful  to  define  the  concept  of  a  distance  to  enclosure.  Given  a  test  point  g  and  a  neighborhood  Jg,  the  distance 
to  enclosure  is 

D(g,Jg)=  min  \\g  -  z\\2.  (5) 

zGconv(  Jg) 

Note  that  D(g ,  Jg)  =  0  if  g  G  con v(Jg). 

Using  this  definition,  the  enclosing  k-NN  neighborhood  of  g  is  given  by  Jg{k*)  where 


k*  =  min  { k  \  D(g,Jg(k))  =  D(g,X)}  . 


If  g  G  conv(T),  this  is  the  smallest  k  such  that  g  G  conv(lT g(k)),  while  if  g  0  conv(T’)  this  is  the  smallest  k 
such  that  g  is  as  close  as  possible  to  the  convex  hull  of  Jg(k). 

An  example  of  an  enclosing  k-NN  neighborhood  is  given  in  the  right  diagram  of  Figure  1.  An  algorithm  for 
computing  the  enclosing  k-NN  is  given  in  Appendix  B. 
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Fig.  1.  In  the  left  figure,  the  natural  neighbors  neighborhood  J^N  is  marked  with  solid  circles.  For  reference,  the  Voronoi  diagram  of  this 
set  is  dashed.  In  the  center  figure  the  natural  neighbors  inclusive  neighborhood  f7gNN"1  is  marked  with  solid  circles;  notice  that  2:5  G  fX'™"1. 
The  shaded  area  indicates  the  inclusion  radius  max.,,  ,g  jnn{  \\g  —  Xj  || 2 } ■  In  the  right  figure,  the  enclosing  k-NN  neighborhood  Jgk  N  is 
marked  with  solid  circles. 


III.  Color  Management 


Our  implementation  of  printer  color  management  follows  the  standard  calibration  and  characterization  approach 
described  in  [6,  Section  5].  The  architecture  is  divided  into  calibration  and  characterization  tables  in  part  to  reduce 
the  work  needed  to  maintain  the  color  reproduction  accuracy,  which  may  drift  due  to  changes  in  the  ink,  substrate, 
temperature,  etc.  This  empirical  approach  is  based  on  measuring  the  way  the  device  transforms  device-dependent 
input  colors  (i.e.  RGB)  to  printed  device-independent  colors  (i.e.  CIELAB).  First,  n  color  patches  are  printed 
and  measured  to  form  the  training  data  X.  y,  where  X  =  {xi}i- \:n  arc  the  measured  CIELAB  values  and  y  = 
{{l/Rii  VGi,  UBi)}i=i:n  are  the  corresponding  RGB  values  input  to  the  printer.  These  training  pairs  are  used  to  learn 
the  LUTs  that  form  the  color  management  system.  The  final  system  is  shown  in  Figure  2:  the  3D  LUT  implements 
inverse  device  characterization  which  is  followed  by  calibration  by  parallel  ID  LUTs  that  linearize  each  RGB 
channel  independently. 


Desired 

CIELab 


Desired  CIELab 
(approximately) 


Fig.  2.  A  standard  color  management  system:  A  desired  CIELAB  color  is  transformed  to  an  appropriate  RGB  color  that  when  input  to  a 
printer  results  in  a  printed  patch  with  approximately  the  desired  CIELAB  color. 


A.  Building  the  LUTs 

The  three  ID  LUTs  enact  gray-balance  calibration,  linearizing  each  RGB  channel  independently.  This  enforces 
that  input  neutral  RGB  color  values  with  R-G-B=d  will  print  gray  patches  (as  measured  in  CIELAB).  That  is, 
if  one  inputs  the  RGB  colors  ( d ,  d ,  d)  for  d  £  {0, . . . ,  255},  the  ID  LUTs  will  output  the  RGB  values  that,  when 
printed,  correspond  approximately  to  uniformly-spaced  neutral  gray  steps  in  CIELAB  space.  Specifically,  for  a 
given  neighborhood  and  regression  method,  the  918  sample  Chromix  RGB  color  chart  is  printed  and  measured  to 
form  the  X  ,y  training  pairs.  Next,  the  L*  axis  of  the  CIELAB  space  is  sampled  with  256  evenly-spaced  values 
Ql *  =  { ( —  1),  0,  0)  }j_i.256  to  form  incremental  shades  of  gray.  For  each  g  e  Gl*,  a  neighborhood  Jg  is 
constructed  and  three  regressions  on  { y |  xr  <G  Jg},  {yc,  \  xt  <G  Jg},  and  {yn,  \  xt  <G  Jg }  fit  locally  linear 
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functions  hc  :  R3—  >  R  for  c  =  R,G,  B.  Finally,  the  ID  LUTs  are  constructed  with  the  inputs  {1, . . . ,  256}  and 
outputs  {hc(g)  |  g  €E  Gl*},  where  c  =  R,G,  B  correspond  to  the  three  ID  LUTs. 

The  effect  of  the  ID  LUTs  on  the  training  data  must  be  taken  into  account  before  the  3D  LUT  can  be  estimated. 
The  training  set  is  adjusted  to  find  y'  that,  when  input  to  the  ID  LUTs,  reproduces  the  original  y.  These  adjusted 
training  sample  pairs  X,  y'  are  then  used  to  estimate  the  3D  LUT.  (Note:  in  our  process  all  the  LUTs  arc  estimated 
from  one  printed  test  chart,  as  is  done  in  many  commercial  ICC  profile  building  services.  More  accurate  results 
are  possible  by  printing  a  second  test  chart  once  the  ID  LUTs  have  been  estimated,  where  the  second  test  chart  is 
sent  through  the  ID  LUTs  before  being  sent  to  the  printer.) 

The  3D  LUT  has  regularly  spaced  gridpoints  g  G  Gfya-ir-  For  the  3D  LUTs  in  our  experiment,  we  used  a 
17x17x17  grid  that  spans  the  CIELab  color  space  with  L*  G  [0, 100]  and  a* ,  b*  G  [—100, 100].  Previous  studies 
have  shown  that  a  finer  sampling  than  this  does  not  yield  a  noticeable  improvement  in  accuracy  [6].  For  each 
g  £  t?L*a*6*  its  neighborhood  Jg  is  determined,  and  regression  on  {.(//,>'  |  xt  G  Jg},  {yep  \  xt  G  Jg } ,  and 
{ db\  |  Xi  G  Jg}  tits  the  locally  linear  functions  hc  :  R3—  >  R  for  c  =  II.  G .  B. 

Once  estimated,  the  LUTs  can  be  stored  in  an  ICC  profile.  This  is  a  standardized  color  management  format, 
developed  by  the  International  Color  Consortium  (ICC).  Input  CIELAB  colors  that  are  not  a  gridpoint  of  the  3D 
LUT  are  interpolated.  The  interpolation  technique  is  not  specified  in  the  standard;  our  experiments  used  trilinear 
interpolation  [29],  a  three-dimensional  version  of  the  common  bilinear-  interpolation.  This  interpolation  technique 
is  computationally  fast,  and  optimal  in  that  it  weights  the  neighboring  grid  points  as  evenly  as  possible  while  still 
solving  the  linear  interpolation  equations  by  choosing  the  maximum  entropy  solution  to  the  linear-  interpolation 
equations  [3,  Theorem  2,  p.  776]. 

B.  Experimental  Setup 

The  different  regression  methods  were  tested  on  three  printers:  an  Epson  Stylus  Photo  2200  (ink  jet)  with  Epson 
Matte  Heavyweight  Paper  and  Epson  inks,  an  Epson  Stylus  Photo  R300  (ink  jet)  with  Epson  Matte  Heavyweight 
Paper  and  third-party  ink  from  Premium  Imaging,  and  a  Ricoh  Aficio  1232C  (laser  engine)  with  generic  laser  copy 
paper.  Color  measurements  of  the  printed  patches  were  done  with  a  GretagMacbeth  Spectrolino  spectrophotometer 


at  a  2°  observer  angle  with  D50  illumination. 
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In  our  experiments,  the  calibration  and  characterization  LUTs  are  estimated  using  local  linear  regression  and  local 
ridge  regression  over  the  enclosing  neighborhood  methods  described  in  Section  II  and  a  baseline  neighborhood  of 
15  nearest  neighbors,  which  is  a  heuristic  known  to  produce  good  results  for  this  application  [9].  All  neighborhoods 
are  computed  by  Euclidean  distance  in  the  CIELAB  colorspace  and  the  regression  is  made  well-posed  by  adding 
nearest  neighbors  if  necessary  to  ensure  a  minimum  of  four  neighbors.  As  analyzed  in  Section  V,  the  enclosing 
k-NN  neighborhood  is  expected  to  have  roughly  seven  neighbors,  where  the  word  “roughly”  is  used  to  capture 
the  fact  that  the  assumptions  of  Theorem  2  (see  Section  VA)  do  not  hold  in  practice.  The  expected  small  size  of 
enclosing  k-NN  neighborhoods  led  us  to  also  implement  a  variation  of  the  enclosing  k-NN  neighborhood  which  uses 
a  minimum  of  k  =  15  neighbors:  this  is  achieved  by  adding  nearest  neighbors  to  the  enclosing  k-NN  neighborhood 
if  there  are  fewer  than  15.  Note  that  this  variant  is  also  an  enclosing  neighborhood,  but  ensures  smoother  regressions 
than  the  enclosing  k-NN  neighborhood. 

The  ridge  parameter  A  in  (2)  was  fixed  at  A  =  0.1  for  all  the  experiments.  This  parameter  value  was  chosen 
based  on  a  small  preliminary  experiment,  which  suggested  that  values  of  A  from  A  =  .001  to  A  =  2  would  produce 
similar  results.  Note  that  the  effect  of  the  regularization  parameter  is  highly  nonlinear,  and  that  steeper  slopes 
(higher  values  of  ft)  are  more  strongly  affected  by  the  regularization.  It  is  common  wisdom  that  a  small  amount  of 
regularization  can  be  very  helpful  in  reducing  estimation  variance,  but  huger  amounts  of  regularization  can  cause 
unwanted  bias,  resulting  in  oversmoothing. 

To  compare  the  color  management  systems  created  by  each  neighborhood  and  regression  method,  729  RGB  test 
color  values  were  drawn  randomly  and  uniformly  from  the  RGB  colorspace,  printed  on  each  printer,  and  measured 
in  CIELAB.  These  measured  CIELAB  values  formed  the  test  samples.  This  process  guaranteed  that  the  CIELAB 
test  samples  were  in  the  gamut  for  each  printer,  but  each  printer  had  a  slightly  different  set  of  CIELAB  test  samples. 
The  test  samples  were  then  input  as  the  “Desired  CIELAB”  values  to  test  the  accuracy  of  each  estimated  LUT, 
as  shown  in  Figure  2.  Each  estimated  LUT  produced  estimated  RGB  values  that,  when  sent  to  the  printer,  would 
ideally  yield  the  test  sample  CIELAB  values.  The  different  estimated  RGB  values  were  sent  to  the  printer,  printed, 
measured  in  CIELAB,  and  the  AE?g4  error  was  computed  with  respect  to  the  test  sample  CIELAB  values.  The 
AEg4  error  metric  is  one  standard  way  to  measure  color  management  error  [6]. 
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IV.  Results 

Tables  I,  III  and  II  show  the  average  A.Eg4  error  and  95th  percentile  error  for  the  three  printers  for  each 
neighborhood  definition,  and  each  regression  method.  In  addition,  we  discuss  in  this  section  which  differences  arc 
statistically  significantly  different  as  judged  at  the  .05  significance  level  by  Student’s  matched-pair  t-test.  These 
three  metrics  (average,  95th  percentile,  and  statistical  significance)  summarize  different  aspects  of  the  results,  and 
are  complementary  in  that  good  performance  with  respect  to  one  of  the  three  metrics  does  not  necessarily  imply 
good  performance  with  respect  to  the  other  two  metrics.  The  baseline  is  the  k  =  15  neighbors  with  local  linear 
regression.  Small  errors  may  not  be  noticeable;  though  noticeability  varies  throughout  the  color  space  and  between 
people,  errors  under  2AEj4  arc  generally  not  noticeable. 


TABLE  I 

A Eg4  Errors  from  the  Ricoh  Aficio  1232C 


Neighborhood 

Regression 

Average  Error 

95th  Percentile  Error 

Enclosing  k-NN 

Linear 

4.27 

8.47 

Ridge 

3.66 

7.38 

Enclosing  k-NN  Minimum  15 

Linear 

4.03 

8.30 

Ridge 

3.45 

6.77 

Natural  Neighbors 

Linear 

3.74 

7.55 

Ridge 

3.69 

7.10 

Natural  Neighbors  Inclusive 

Linear 

3.74 

7.63 

Ridge 

4.03 

7.70 

15  Neighbors 

Linear 

4.41 

9.84 

Ridge 

4.16 

8.61 

The  Ricoh  laser  printer  is  the  least  linear  of  the  three  printers,  likely  due  to  the  printing  instabilities  that  are 
common  with  high-speed  laser  printers.  For  the  Ricoh,  all  of  the  enclosing  neighborhoods  have  lower  average  error 
and  lower  95th  percentile  error  than  the  baseline  of  15  neighbors  and  linear  regression.  Further,  all  of  the  methods 
were  statistically  significantly  better  than  the  15  neighbors  baseline,  except  for  enclosing  k-NN  (linear),  which 
was  not  statistically  significantly  different.  Changing  to  ridge  regression  for  15  neighbors  eliminates  over  10%  of 
the  95th  percentile  error.  Thus,  the  adaptive  methods  and  the  regularized  regression  make  a  clear  difference  for 


nonlinear  color  transformations. 
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TABLE  II 

AS94  Errors  from  the  Epson  Photo  Stylus  2200 


Neighborhood 

Regression 

Average  Error 

95th  Percentile  Error 

Enclosing  k-NN 

Linear 

2.32 

5.01 

Ridge 

2.20 

5.03 

Enclosing  k-NN  Minimum  15 

Linear 

2.34 

5.00 

Ridge 

2.32 

4.95 

Natural  Neighbors 

Linear 

2.40 

5.48 

Ridge 

2.20 

5.16 

Natural  Neighbors  Inclusive 

Linear 

2.41 

5.38 

Ridge 

2.43 

5.61 

15  Neighbors 

Linear 

2.44 

6.52 

Ridge 

2.43 

6.46 

TABLE  III 

A_Eg4  Errors  from  the  Epson  Photo  Stylus  R300 


Neighborhood 

Regression 

Average  Error 

95th  Percentile  Error 

Enclosing  k-NN 

Linear 

1.67 

3.65 

Ridge 

1.55 

3.32 

Enclosing  k-NN  Minimum  15 

Linear 

1.51 

3.05 

Ridge 

1.52 

3.10 

Natural  Neighbors 

Linear 

1.71 

3.49 

Ridge 

1.54 

2.87 

Natural  Neighbors  Inclusive 

Linear 

1.77 

3.55 

Ridge 

1.79 

3.55 

15  Neighbors 

Linear 

1.55 

3.32 

Ridge 

1.55 

3.34 

For  the  Ricoh  laser  printer,  the  lowest  average  error  and  lowest  95th  percentile  error  arc  produced  by  enclosing 
k-NN  minimum  15  (ridge):  the  95th  percentile  error  is  reduced  by  21%  over  15  neighbors  (ridge),  and  the  total  error 
reduction  is  31%  over  the  baseline  of  15  neighbors  (linear).  Enclosing  k-NN  minimum  15  (ridge)  is  statistically 
significantly  better  than  all  other  methods  for  this  printer.  These  results  suggest  that  highly  nonlinear  color  transforms 
can  be  effectively  modeled  by  local  regression  using  a  lower  bound  on  the  number  of  neighbors  (to  keep  estimation 
variance  low)  but  allowing  possibly  more  neighbors  depending  on  their  spatial  distribution. 


On  the  Epson  2200  all  of  the  enclosing  neighborhoods  have  lower  average  and  95th  percentile  error  than  the 
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baseline  of  15  neighbors  (linear).  However,  only  enclosing  k-NN  (ridge),  enclosing  k-NN  minimum  15  (ridge),  or 
natural  neighbors  (ridge)  were  statistically  significantly  better  (the  other  methods  were  not  statistically  significantly 
different). 

The  natural  neighbors  (ridge)  is  statistically  significantly  better  than  all  of  the  other  methods  except  for  enclosing 
k-NN  (linear  and  ridge),  which  are  not  statistically  significantly  different.  Enclosing  k-NN  (ridge)  is  statistically 
significantly  better  than  all  of  the  other  methods  except  for  natural  neighbors  (ridge).  These  results  arc  consistent 
with  the  Ricoh  results  in  that  enclosing  neighborhoods  coupled  with  ridge  regression  provide  significant  benefit. 

The  Epson  R300  inkjet  fits  the  locally-l inear  model  well,  as  evident  in  the  low  errors  across  the  board  and  the 
small  average  and  95th  percentile  error  differences  between  methods.  Here,  few  methods  are  statistically  significantly 
different,  but  the  natural  neighbors  inclusive  is  statistically  significantly  worse  than  the  other  neighborhood  methods, 
including  the  baseline.  We  hypothesize  that  because  the  natural  neighbors  inclusive  creates  in  some  instances  very 
large  neighborhoods,  this  increase  in  error  may  be  caused  by  the  bias  of  oversmoothing. 

We  have  presented  and  discussed  our  results  in  terms  of  the  A/?44  error  because  it  is  considered  a  more  accurate 
error  function  for  color  management  than  A  E%6  (Euclidean  distance  in  CIELAB)  [6].  In  (2)  and  (1)  we  minimize 
the  Euclidean  error  in  CIELAB,  because  this  leads  to  a  tractable  objective,  whereas  minimizing  A Eg4  error  does 
not.  The  A Ej6  errors  were  also  calculated  and  compared  to  the  AEj4  errors.  The  results  were  very  similar  in  terms 
of  the  rankings  of  the  regression  methods  and  the  statistically  significant  differences. 

In  summary,  the  experiments  show  that  using  an  enclosing  neighborhood  is  an  effective  alternative  to  using  a 
fixed  neighborhood  size.  In  particular,  enclosing  k-NN  minimum  15  (ridge)  achieved  the  lowest  average  and  95th 
percentile  error  rates  for  the  most  nonlinear  printer  (the  laser  printer),  and  was  either  the  best  or  a  top  performer 
throughout.  Also,  ridge  regression  showed  consistent  performance  gains  over  linear  regression,  especially  with 
smaller  neighborhoods.  Importantly,  the  overall  low  error  rates  on  the  inkjet  printers  suggest  that  the  locally  linear 
model  fits  sufficiently  well  on  these  printers,  resulting  in  less  room  for  improvement  over  the  baseline  method. 

V.  Sizes  of  Enclosing  Neighborhoods 

Enclosing  neighborhoods  adapt  the  size  of  the  neighborhood  to  the  local  spatial  distribution  of  the  training  and 


test  sample.  In  this  section  we  consider  the  key  question,  “How  many  neighbors  are  in  the  neighborhood?”  We 
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consider  analytic  and  experimental  answers  to  this  question,  and  how  the  neighborhood  size  will  relate  to  the 
estimation  bias  and  variance. 

A.  Analytic  Size  of  Neighborhoods 

Asymptotically,  the  expected  number  of  natural  neighbors  is  equal  to  the  expected  number  of  edges  of  a  Delaunay 
triangulation  [27],  A  common  stochastic  spatial  model  for  analyzing  Delaunay  triangulations  is  the  Poisson  point 
process,  which  assumes  that  points  arc  drawn  randomly  and  uniformly  such  that  the  average  density  A  is  n  points 
per  volume  S.  Given  the  Poisson  point  process  model,  the  expected  number  of  natural  neighbors  is  known  for  low 
dimensions:  6  neighbors  for  two  dimensions,  487r2/35+2  ~  15.5  neighbors  for  three  dimensions,  and  340/9  ~  37.7 
neighbors  for  four  dimensions  [27]. 

The  following  theorem  establishes  the  expected  number  of  neighbors  in  the  enclosing  k-NN  neighborhood  if  the 
training  samples  arc  sampled  from  a  uniform  distribution  over  a  hypersphere  about  the  test  sample. 

Theorem  2  ( Asymptotic  Size  of  Enclosing  k-NN):  Suppose  n  training  samples  are  uniformly  sampled  from  a 
distribution  that  is  symmetric  around  a  test  sample  in  Rd.  Then,  asymptotically  as  n  — >  oo,  the  expected  number 
of  neighbors  in  the  enclosing  k-NN  neighborhood  is  2d  +  1. 

The  proof  is  given  in  Appendix  C. 

For  both  the  natural  neighbors  and  enclosing  k-NN,  these  analytic  results  model  the  training  samples  as  sym¬ 
metrically  distributed  about  the  test  point.  This  is  a  good  model  for  the  general  asymptotic  case  where  the  number 
of  training  samples  n  — ►  oo,  because  if  the  true  distribution  of  the  training  samples  is  smooth  then  the  random 
sampling  of  training  samples  local  to  the  test  sample  will  appear-  as  though  drawn  from  a  uniform  distribution. 

B.  Experimental  Size  of  Neighborhoods 

The  analytic  neighborhood  size  results  suggest  that  the  natural  neighbors  is  a  larger  neighborhood  on  average  than 
the  enclosing  k-NN  neighborhood,  which  we  found  to  be  true  experimentally.  Representative  empirical  histograms 
of  the  neighborhood  sizes  are  shown  in  Figure  3.  They  show  the  distribution  of  the  neighborhood  sizes  for  the 


color  management  of  the  Ricoh  printer  from  918  training  samples. 
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Fig.  3.  Histograms  show  the  frequency  of  each  neighborhood  size  when  estimating  the  gridpoints  of  the  3D  LUT  for  the  Ricoh  Aficio 
1232C. 


By  design,  the  enclosing  k-NN  is  the  smallest  possible  k- NN  neighborhood  that  encloses  the  test  point,  which 
should  keep  estimation  bias  relatively  low  because  the  neighbors  are  relatively  local.  The  natural  neighbors  tend 
to  form  larger  neighborhoods  than  enclosing  k-NN,  and  a  particular  natural  neighbor  could  be  close  or  far  from 
the  test  sample.  Thus  it  is  hard  to  judge  how  local  the  natural  neighbors  are.  The  natural  neighbors  inclusive  has 
relatively  large  neighborhoods,  which  suggests  that  some  of  the  natural  neighbors  must  in  fact  be  quite  far  from 
the  test  sample.  The  large  size  of  the  natural  neighbors  means  that  the  estimated  transforms  will  be  fairly  linear 
across  the  entire  colorspace,  which  can  oversmooth  the  estimation. 

One  cause  of  large  neighborhood  sizes  is  when  a  test  point  is  outside  the  convex  hull  of  the  entire  training  set. 
As  discussed  in  Section  I,  the  set  of  training  samples  may  not  span  the  full  colorspace,  resulting  in  exactly  this 
situation.  An  illustration  of  how  such  cases  affect  the  enclosing  neighborhood  sizes  is  provided  in  Figure  4.  Here, 
the  enclosing  k-NN  neighborhood  is  {x\,X2}.  From  the  Voronoi  diagram,  one  can  read  that  the  natural  neighbors  of 
g  are  { x \ .  X2,  x%,  a: 4 ,  x7}.  The  largest  of  the  neighborhoods  in  this  case  is  the  natural  neighbors  inclusive,  composed 

Of  {XI,X2,X3,X4,X5,X6,X7}. 

When  training  and  test  samples  are  drawn  iid  in  high-dimensional  feature  spaces,  the  test  samples  tend  to  be  on 
the  boundary  of  the  training  set,  an  effect  known  as  Bellman’s  curse  of  dimensionality  [5],  [25].  We  hypothesize 
that  this  effect  for  high-dimensional  feature  spaces  would  cause  an  abundance  of  large  neighborhoods  for  the  natural 


neighbors  methods. 
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Fig.  4.  Example  Voronoi  diagram  for  the  situation  where  the  test  point  g  lies  outside  the  convex  hull  of  the  training  samples. 


The  inclusion  of  possibly  far-away  points  to  “enclose”  the  test  point  may  result  in  increased  bias.  Based  on 
the  neighborhood  size  histograms  and  our  analysis  of  the  different  neighborhoods,  enclosing  k-NN  should  incur 
the  lowest  bias.  On  the  other  hand,  we  expect  natural  neighbors  inclusive  to  have  the  largest  positive  effect  on 
variance  because  a  larger  neighborhood  tends  to  lead  to  lower  estimation  variance  for  regression  problems,  though 
this  matter  is  not  so  straightforward  for  classification. 

VI.  Discussion 

We  have  proposed  the  idea  of  using  an  enclosing  neighborhood  for  local  learning  and  theoretically  motivated  it  for 
local  linear  regression.  Such  automatically  adaptively-sized  neighborhoods  can  be  useful  in  applications  where  it  is 
difficult  to  cross-validate  a  neighborhood  size,  and  in  particular  we  have  shown  that  using  enclosing  neighborhoods 
can  significantly  reduce  color  management  errors.  Local  learning  can  have  less  bias  than  more  global  estimation 
methods,  but  local  estimates  can  be  high-variance  [5].  Enclosing  neighborhoods  limit  the  estimation  variance  when 
the  underlying  function  does  have  a  (noisy)  linear  trend.  Ridge  regression  is  another  approach  to  controlling 
estimation  variance,  but  does  so  by  penalizing  the  regression  coefficients,  which  increases  bias.  In  contrast,  we 
hypothesize  that  the  effect  of  an  enclosing  neighborhood  on  bias  may  be  either  positive  or  negative,  depending  on 
the  actual  geometry  of  the  data.  It  remains  an  open  question  how  the  estimation  bias  and  variance  differ  between 
enclosing  neighborhoods  and  the  standard  k-NN,  which  uses  a  fixed,  but  cross-validated,  k  for  all  test  samples. 

The  definition  of  the  neighborhood  for  local  learning  is  important,  whether  for  local  regression,  or  for  nearest 
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neighbors  classification.  We  conjecture  that  using  enclosing  neighborhoods  for  other  local  learning  tasks  may  lead 
to  improved  performance,  particularly  for  densely-sampled  feature  spaces. 
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Appendix  A:  Proof  of  Theorem  1 

Proof:  Form  the  (d+  1) -dimensional  vectors  g  =  [g  1]T  and  a  =  [a  ao]1.  Let  m  =  Jg \ ,  and  re-index  the  training 
samples  so  that  x%  €  Jg  for  %  =  1, . . .  ,m.  Let  X  denote  the  (d  +  1)  x  m  matrix  whose  ith  column  is  [xi  1]T. 
Further,  let  (3  =  [$  To]7',  let  y  be  the  rn  x  1  vector  with  / th  component  f(xi),  and  let  Co  be  the  m  x  1  vector  with 
ith  component  oot.  The  least-squares  regression  coefficients  which  solve  (3)  arc 

p  =  (xF)-1^ 

=  (XXT)'1X(XTd  +  d)) 

=  a  +  (XXT)_1Xu). 

Note  that  E[/3]  =  a  +  (XX.T)~1±E  [Co],  Let  I  denote  the  m  x  m  identity  matrix.  Then  the  covariance  matrix  of 
the  regression  coefficients  is 

cov(/3)  =  E  ^-a-(XXT)_1X£[w])  ^  -  d  -  (XX7)”1^  [d)])T 
=  (XXT)_1X  co v(a>)  XT{X.XT)~1 
=  (XX7)-1^  a2 1  ±T(XXT)~1 
=  a2(XXT)-1. 
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The  variance  of  the  estimate  y  =  (3T g  is 


var  (y) 


E[(y-E[y})2] 


gT  co v(/3)g 


=  ffT(XX5 


go 


(6) 


The  proof  is  finished  by  showing  that  jT(XXJ  )  lg  <  1  if  Jg  is  an  enclosing  neighborhood. 

Assume  Jg  is  an  enclosing  neighborhood.  Then  by  definition  it  must  be  that  \  Jg\  >  d+  1,  and  that  there  exists 
some  weight  vector  v  such  that  Xu  =  g  (which  includes  the  constraint  that  lTv  =  1)  and  v  A  0.  The  training 
and  test  samples  arc  assumed  to  be  drawn  iid  from  a  sufficiently  nice  distribution  over  the  (/-dimensional  feature 
space  such  that  the  training  and  test  samples  are  in  general  position  with  probability  one;  that  is,  the  enclosing 
neighbors  and  test  sample  do  not  lie  in  a  degenerate  subspace,  and  thus  it  must  be  that  the  matrix  X  is  full  rank. 
Then  the  Moore-Penrose  pseudo-inverse  of  X  is  well-defined  as  the  m  x  1  vector  w  =  X7  ("XX7  j  g,  and  w  is 
the  minimum  norm  solution  to  Xtr  =  g  such  that  wTw  <  vTv  for  any  v  that  satisfies  Xr  =  g  [30]. 

Then, 


wTw  =  gT  (XXT^  1 XXT  (XXT^  '  g 
=  gT  (xXTj  1  g. 


(7) 


Because  0  <  v3  <  1  for  each  j  G  Jg,  it  must  be  that  0  <  v]  <  Vj.  Combining  these  facts  with  the  property  that 
wTw  >  0  because  it  is  a  sum  of  squared  (positive)  elements,  the  following  holds: 


m  m 


0  <  wTw  <  vTv  =  ^  v |  <  ^2  vj  =  1  • 
3= 1  3= 1 


Then  from  (7)  it  must  also  be  that  gT  (XX7  )  g  <  1,  which  coupled  with  (6)  completes  the  proof. 


-l 
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Appendix  B:  Method  for  Calculating  the  Enclosing  k-NN  Neighborhood 

1)  Define  an  e  that  is  the  threshold  for  how  small  the  distance  to  enclosure  must  be  before  considering  the 
neighborhood  to  effectively  enclose  the  test  point  in  its  convex  hull.  Generally,  e  should  be  small,  but  how 
small  may  depend  on  the  relative  scale  of  the  data.  For  the  CIELAB  space,  where  a  just  noticeable  dijference 
is  roughly  2 A E,  we  set  e  =  .1. 

2)  Re-order  the  set  of  training  samples  { xr }  for  i  =  1 , . . . ,  n  by  distance  from  the  test  point  g  so  that  xj  is  the 
jth  nearest  neighbor  to  g. 

3)  Add  x\  to  the  set  S. 

4)  Define  the  indicator  function  I(xj),  where  I(xj)  =  1  if  Xj  lies  in  the  same  half-space  as  g  with  respect  to  the 
hyperplane  that  passes  through  x\  and  is  normal  to  the  vector  connecting  g  to  x \ ,  and  I(xj)  =  0  otherwise. 

5)  Add  to  the  set  S  the  training  point  x*  nearest  to  g  in  the  half-space,  that  is 

x*  =  argmin  \\g  —  Xi\\2 

XilI(Xi)  =  l 

6)  If  the  distance  to  enclosure  D(g,S)  <  e,  then  stop  iterating,  and  the  set  of  all  training  samples  closer  than 
x*  to  g  form  the  enclosing  k-NN  neighborhood. 

7)  Project  g  onto  the  convex  hull  of  S,  and  denote  this  point  g.  Re-define  the  indicator  function  I(xj)  =  1  if 
Xj  lies  in  the  same  half-space  as  g  with  respect  to  the  hyperplane  that  passes  through  g  and  is  normal  to  the 
vector  connecting  g  to  g,  and  I(xj)  =  0  otherwise.  If  [ ( Xj )  =  0  for  all  training  samples,  then  stop  iterating, 
and  the  set  of  all  training  samples  that  arc  closer  than  the  farthest  member  of  S  form  the  enclosing  k-NN 
neighborhood. 

8)  Repeat  steps  5-7  until  a  stopping  criteria  is  met. 

Appendix  C:  Proof  of  Theorem  2 
To  prove  the  theorem,  the  following  lemma  will  be  used. 

Lemma:  Given  X  E  R"xd,  let  conv(X)  denote  the  convex  hull  of  the  rows  {xj}1=  \  :n  of  X.  If  and  only  if  the  origin 
0  E  conv(X),  then  the  origin  is  in  the  convex  hull  of  some  positive  scaling  of  the  {xi}i= \-n  i.e.  ,  0  E  conv(AX), 
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where  A  is  a  positive  definite  diagonal  n  x  n  matrix. 

Proof  of  Lemma:  Suppose  0  G  conv(X).  By  definition,  there  exists  a  weight  vector  w  such  that  wTl  =  1,  w  >z  0, 
and  up  X  =  0}.  If  X  is  scaled  by  the  positive  definite  diagonal  matrix  A.  then  it  must  be  shown  that  there  exist 
a  set  of  weights  w'  with  the  properties  that  w'T  1  =  1,  w'  G  0,  and  wri  AX  =  0}.  Denote  the  normalization  scalar 
z  =  wT A_11,  then  it  can  be  seen  that  one  such  weight  vector  that  satisfies  this  condition  is  w'  =  (A ~1w)/z 
and  we  conclude  that  0  G  conv(AX).  Next,  suppose  that  0  0  conv(X),  then  it  must  be  shown  that  scaling  X 
by  any  positive  definite  diagonal  matrix  A  does  not  form  a  convex  hull  that  contains  the  origin.  The  proof  is  by 
contradiction:  assume  that  0  G  conv(AX)  but  0  0  conv(X).  The  first  paid  of  this  proof  could  be  applied,  scaling 
AX  by  A^1,  which  would  lead  to  the  conclusion  that  0  G  conv(X),  thus  forming  a  contradiction. 

Now  we  begin  the  body  of  the  proof  of  Theorem  2.  Without  loss  of  generality,  assume  the  test  point  is  the 
origin  g  =  0.  Let  X  be  the  random  n  x  d  matrix  with  rows  {xpj-  \  :n  drawn  independently  and  identically  from 
a  symmetric  distribution  in  R'/.  Rearrange  the  rows  of  X  so  that  they  are  sorted  such  that  i  |  [2  <  1 1 .x a- 1 1 2  < 

|  |rcfc_|_i 1 12  for  all  k.  As  established  in  the  lemma,  without  a  loss  of  generality  with  respect  to  the  event  0  G  conv(X), 
scale  all  rows  such  that  1 1 rr; y 1 1 2  =  1  for  all  j.  Then  0  G  conv(X)  if  and  only  if  n  >  d  and  the  row  vectors  are  not 
all  contained  in  some  hemisphere  [31].  Let  Hn  indicate  the  event  that  n  vectors  lie  on  the  same  hemisphere,  and 
Hn  will  denote  the  complement  of  Hn.  Wendel  [32]  showed  that  for  n  points  chosen  uniformly  on  a  hypersphere 
in  Rd, 

d- 1 

P  (Hn)  =  2~n+1  ]T 

k= 0 


n  —  1 
k 


V  n  >  1. 


(8) 


Let  Fn  be  the  event  that:  the  first  n  ordered  points  enclose  the  origin,  but  the  first  n  —  1  ordered  points  do  not 
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enclose  the  origin.  The  probability  of  the  event  Fn  is 


P  (Fn)  =  P(Hn,Hn_i 


=  F(Hn\Hn^)F(Hn^ 


=  (1  P (Hn\Hn—\))  P(iTn_1) 


=  P(iTn_r)  -  P(iTn,iTn_i 


=  F(Hn_i)  -  P(Hn). 


(9) 


Because  one  or  zero  points  cannot  complete  a  convex  hull  around  the  origin,  P ( F) )  =  0  and  P ( F\ )  =  0.  Combining 
(8)  and  (9),  and  using  the  recurrence  relation  of  the  binomial  coefficient 


n\  n 

k)  +  U  +  l 


n  +  1 
k  +  1 


(10) 


d- 1 


?  ^n)  =  2-n+2E(  u  -2_n+1E 


k= 0 
d- 1 


n  —  2 


d- 1 


=  2“n+1 


2-n+t 

2-n+t 

2-n+t 

2-n+t 


E 

k= 0  L 
d—  1  r 

E 


fc=0  L 
n  —  2 

d-  1 

n  —  2 
d  —  1 
n  —  2 
d  —  1 


k 

n  —  2 
A: 

n  —  2 
A: 


fc=0 

n  —  1 
A: 

n  —  2 
k  —  1 


n  —  1 


n  —  2 
A: 


n  —  2\  /n  —  2 

d-2  J  +  U-2 


n  —  2 
-1 


for  all  n  >  2, 


n  —  2 
d  —  3 


+  ...+ 


n  —  2 


n  —  2 
-1 


(11) 


where  the  last  line  follows  because  =  0  for  all  r  [33,  p.  154].  Then, 


E[FJ  =  f>P(Fa)  =  E  a2 


a=2 


a= 2 


a  —  2 
d-  1 
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To  simplify,  change  variables  to  6  =  a  —  2, 


bbl  |  26! 

(d-l)!(6-d+l)!  +  (d  —  1)!(6  —  d  +  1)! 

(6  +  1)6!  6! 

(d  —  1)!(6  —  d  +  1)!  +  (d  —  1)!(6  —  d  +  1)! 

d  (6  +  1)!  6! 

d!(6  —  d  +  1)!  +  (d  —  1)!(6  —  d+  1)! 


where  the  second  to  last  line  follows  from  (10).  Expand  the  recuiTence, 

-  “5,  +  1)  Q  - : 2-1  (")  +  2-2(<i  +  1)  Q  -  2-2  Q+... 

=  “s,  (2_1  G) +  +  *>  -  *)  Q +••  • + 2‘”‘1(2(<i + -  b  G) 

+2-1«i+1)(’*+1)) 

-  “»  (*-‘G)  +  (g2-‘-,(2<i+l)Q)  +2-”-1(d+l)(n^1))  . 

The  first  and  third  terms  in  this  equation  converge  to  zero  as  n  — ►  oo,  leaving 

oo  /  -\ 

E[Fa]  =  (2d+l)^2-i-irj. 
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Using  the  identity  =  0  [33,  p.  155],  and  the  summation  [33,  p.  199] 


(1  -  z)^1 


OO 


E 


z 


i 


with  z  =  1/2,  establishes  the  result:  E[Fa]  =  2d  +  1. 
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